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Sea turtles are reptiles listed on the international union for conservation of 
nature (IUCN) red list of threatened species and the convention on 
international trade in endangered species of wild fauna and flora (CITES) 
Appendix I as species threatened with extinction. Sea turtles are nearly 
extinct due to natural predators and people who are frequently incorrect or 
even ignorant in determining which turtles should not be caught. The aim of 
this study was to develop a classification system to help classify sea turtle 
species. Therefore, the ensemble deep learning of convolutional neural 
network (CNN) method based on transfer learning is proposed for the 
classification of turtle species found in coastal communities. In this case, 
there are five well-known CNN models (VGG-16, ResNet-50, ResNet-152, 
Inception-V3, and DenseNet201). Among the five different models, the three 
most successful were selected for the ensemble method. The final result is 
obtained by combining the predictions of the CNN model with the ensemble 
method during the test. The evaluation result shows that the VGGI16 - 
DenseNet201 ensemble is the best ensemble model, with accuracy, 
precision, recall, and Fl-Score values of 0.74, 0.75, 0.74, and 0.76, 
respectively. This result also shows that this ensemble model outperforms 
the original model. 
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1. INTRODUCTION 


Turtles are reptiles that can be easily recognized by their distinctive body shape from the head and 
carapace or dorsal (back) [1]. There are three main groups of turtles: land turtles, aquatic turtles, and marine 
turtles. Marine turtles are also known as sea turtles. There are seven species of sea turtles in the world 
[2]-[5], six of which can be found in Indonesia: green turtles (Chelonia Mydas), hawksbill turtles 
(Eretmochelys Imbricata), tortoiseshell turtles (Lepidochelys Olivacea), flat turtles (Natator Depressus), 
leatherback turtles (Dermochelys Coriacea), and loggerhead turtles (Caretta Caretta) [6]. Based on data from 
the Bengkulu Province Marine and Fisheries Service, it was stated that there were only 4 species that visited 
the Bengkulu coast, namely green turtles, hawksbill turtles, loggerhead turtles, and olive ridley turtles. 

Sea turtles are currently threatened with extinction and were added to the list of endangered reptiles 
on the international union for conservation of nature (IUCN) red list and convention on international trade in 
endangered species (CITES) Appendix I of species threatened with extinction [7]-[9]. The condition of 
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endangered sea turtles is caused by threats from human and animal predators. Humans and predators take 
turtle eggs as a source of protein, and in traditional rituals, turtle backs are used as accessories [10]. Coastal 
communities and other communities, in general, are often mistaken and cannot distinguish the types of turtles 
they find on the coast. This problem is caused by the high similarity between each type of turtle. This high 
level of similarity is also an obstacle when reporting turtle findings to conservation authorities. Reports of 
finding turtles that are still handled manually also cause the process of handling and saving turtles to take a 
long time. This problem hinders conservationists from making semi-natural nests for sea turtles, which 
results in increased mortality and eggs failing to hatch. Therefore, to reduce illegal fishing and assist in the 
conservation of sea turtles, technology is needed to classify turtle species. 

Deep learning is a new and popular classifier technology. Deep learning can manage vast volumes 
of data. One of the benefits of deep learning is transfer learning, in which the model learnt for one task can be 
applied to other tasks with limited data [11]-[13]. Deep learning, particularly convolutional neural networks 
(CNN) inspired by the mammalian visual brain, has the capacity to evaluate and research a huge number of 
features on its own, including some not previously addressed by experts [14]. Not many studies on the turtle 
classification system that have been carried out by previous researchers can be found. Several related studies 
were found: Liu ef al. [15] in his research conducted a classification of turtles using deep learning with 
transfer learning: LeNet, AlexNet, VGG16, VGG16-TL, InceptionV3 and Inception v3-TL based on CNN 
resulting in an average accuracy of 65.2%, 80.6%, 84.4%, 91.4%, 87.2% and 96.4%. Paixao et al. [16] 
developed a texture-based classification system for five species of sea turtles found on the coast of Brazil. 
The method used is k-nearest neighbors (KNN) and support vector machine (SVM) with color histograms 
and chromaticity moments features. The KNN method is claimed to be better than SVM, with a global 
accuracy of 0.74. Yussof et al. [17], developed a sea turtle identification system using transfer learning, CNN 
AlexNet, and SVM. The dataset is sourced from the Biodiversity Research Center, Academia Sinica, Taiwan. 
The highest level of accuracy of the classification system is 62.9%. Dunbar et al. [18] conducted a study on 
the practical use of photographic identification (PID) methods to identify sea turtles. PID case studies were 
conducted to identify sea turtles in Reunion Island (France), Roatan (Honduras), and the Republic of 
Maldives. The study results show that PID can be an effective and efficient method for gathering information 
about animals. 

Different from the studies mentioned above, the learning method used in this study is based on the 
concept of deep learning training via the well-known and successful use of transfer learning with appropriate 
pre-trained models [19]. Then combine the power of transfer learning models known as "deep learning 
ensembles" [20]. In this case, VGG-16 , ResNet-50, InceptionV3, DenseNet201, and Resnet152 [21]. From 
the training results, it is known that each CNN model has different generalization abilities on the dataset. 
Based on these observations, the three most successful CNN models, ResNet-50, InceptionV3, and 
DenseNet201, were selected for the ensemble method. The classification results obtained from the selected 
CNN model are combined using the ensemble average voting method to reach the final output of the 
classification. As a result of this ensemble method, satisfactory classification results were obtained. 
Therefore, this study proposes an ensemble method using three transfer learning models to strengthen the 
final decision and observes the use of original and augmented data in the model. 


2. METHOD 

This study will be built using the cross-industry standard process for data mining (CRISP-DM) 
method. Cross-industry standard process for data mining (CRISP-DM) was developed in 1996 by the 
analysis of several industries such as standardization Daimler Chrysler (Daimler-Benz), statistical package 
for the social sciences (SPSS), and non-conformance report (NCR). CRISP-DM can be used as a general 
problem-solving strategy for a business or research unit [22]-[24]. The flow of this method can be seen in 
Figure 1. 

The CRISP-DM method begins with the Business Understanding Phase, which is the business 
understanding phase to determine the direction of research to be carried out. Then proceed with the data 
understanding phase, which is the data understanding phase for dealing with data needs related to business 
goals. Furthermore, the data preparation phase is carried out, which is a phase to improve data quality so that 
the data is in accordance with the modeling process to be carried out. This modeling phase involves the 
creation of a model, after which the data is ready for the model-based training process. Next is the Evaluation 
phase, in this phase an evaluation will be carried out on the model in which the iteration is made. The last 
phase is the deployment phase, in which the model will be implemented on the desired platform. 
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Figure 1. CRISP-DM method [24] 


2.1. Business understanding 

Coastal communities and other communities, in general, are often mistaken and cannot distinguish 
the types of turtles they find on the coast. Reports of turtle weaving that is still manual also cause the process 
of handling and saving turtles to take a long time. This problem hinders conservation parties from making 
semi-natural nests, which results in increased mortality and eggs that fail to hatch. To reduce threats and aid 
conservation, a technology capable of classifying turtle species is required. One of the emerging and popular 
technologies for classifying is deep learning. Deep learning can perform classification through images or 
videos. The advantage of deep learning is the ability to transfer learning, which means that the model learned 
from one task can be reapplied to another task that may have limited data. Transfer learning performance can 
be improved by combining transfer learning, also known as the "ensemble deep learning model. The deep 
learning ensemble model generated in this study can be implemented into a web- or mobile-based system to 
assist the classification and reporting process when the community finds turtles. As a result, this system is 
expected to help the community and conservation organizations protect turtles by providing access to a 
system for the classification and reporting of turtle findings that can be accessed via cellphones or personal 
computers. 


2.2. Data understanding 

This study requires analysis of data needs and data collection carried out in three ways: literature 
study, observation, and interviews. The dataset used is a turtle image consisting of 4 classes according to the 
types of turtles that have been validated by experts: green turtles, hawksbill turtles, olive ridley turtles, and 
loggerhead turtles. Figure 2 depicts an example sea turtle images from the dataset. Figures 2(a) to 2(d) show 
green turtle (Chelonia mydas), olive ridley (Lepidochelys olivacea), hawksbill turtle (Eretmochelys 
imbricata), and Loggerhead turtle (Lepidochelys olivacea), in that order. Before data augmentation, the data 
was generated with different positions of the turtles, specifically when they were on the coast and when they 
were at sea. 


(a) 


Figure 2. Sample image of sea turtles in the dataset: (a) green turtle (Chelonia mydas), (b) olive ridley 
(Lepidochelys olivacea), (c) hawksbill turtle (Eretmochelys imbricata), (d) Loggerhead turtle 
(Lepidochelys olivacea) 
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The image dataset used in this study consists of a primary dataset and a secondary dataset of images 
taken from public datasets. The primary dataset of 654 images was collected by the research team at the 
“Konservasi Penyu Alun Utara” located in Pekik Nyaring Village, Central Bengkulu Regency, Bengkulu 
Province, Indonesia. While the secondary dataset of 850 images was taken from Smaranjit Ghose's public 
dataset on Kaggle [25]. The composition of the dataset is shown in Table 1. 


Table 1. Dataset composition 


No Name of Data Number Image Original 
1 Green Turtles 376 
2 Olive Ridley Turtles 376 
3 Loggerhead Turtles 376 
4 Hawksbill Turtles 376 
Total 1504 


2.3. Data preparation 

At the data preparation stage, the research team resizes, augments and separates the dataset from the 
data that has been obtained. The image size is resized to 224x224 px, then augmented with rotation, noise, 
brightness, and blur augmentation techniques, and then the dataset is divided into three parts, namely training 
data, validation data, and test data. Before being divided, the data, especially the distribution of the dataset 
after processing, are shown in Table 2. The augmentation technique is performed with random values in a 
range, each of which is rotation: -40 to 40, noise: 1 to 5%, brightness: -25% to +25. %, blur: 1 to 5 px. The 
distribution of the dataset after the process is shown in Table 2. 


Table 2. Split dataset 
No Name of Data | Number Image Original 


1 Training 70% of the total = 4228 

2 Validation 20% of the total = 1208 

3 Test 10% of the total = 580 
Total 6016 


2.4. Modeling 

The ensemble deep learning that will be carried out in this study will use the average voting 
strategy. The average vote will take the probabilities made for each data point in the average. In this method, 
the ensemble classifier system takes the average of the predictions from all the models and uses it to make 
the final prediction. At this stage, we will simulate the ensemble deep learning by adjusting the parameters to 
produce the best model. The parameters needed to be set in the model training process are input, batch size, 
epoch, sea turtle dataset, hyperparameter, and weight evaluation. The same parameter properties are applied 
to five types of transfer learning: InceptionV3, DenseNet, VGG16, ResNet50, and ResNet152. Three of the 
five models will be selected, which are good for an ensemble model. Details of the design stages of the sea 
turtle’s classification model are shown in Figure 3. 

From the detailed steps in Figure 3, it can be seen that the training dataset is trained and validated 
using transfer learning with the InceptionV3, DenseNet, VGG16, ResNet50, and ResNet152 architectures. A 
test dataset is used to evaluate the performance of each architecture's output model. The average vote of the 
three best models was taken based on the performance of the validation and evaluation of the test dataset to 
be used as the final ensemble model for the classification system of the turtle. In the training process, the 
initial weights of the pre-trained model used have been trained with the ImageNet dataset; the only layer 
taken is the feature extraction layer, while the last dense layer is replaced with a fully connected layer for the 
sea turtle classifier. The training process is evaluated based on data loss and accuracy in the training and 
validation datasets, as well as the values of precision, recall, and TF1 score. 

Five well-known CNN architectures (InceptionV3, DenseNet, VGG16, ResNet50, and ResNet152) 
were trained in the study with a batch size of 16 and a learning rate of 0.0001. We trained models with the 
same epoch size (300 epoch). The callbacks list method will save accuracy for the training model. Adam was 
used as the optimization function to minimize the categorical cross-entropy loss function. The softmax 
activation function was used in the last layer for classification. Early stopping was utilized to overcome 
overfitting in models. All experiments were carried out on a Windows-based PC with 4 GB of RAM, a 4 GB 
hard drive, and a 256-bit Nvidia Core i5 graphics card. The computer languages Python and the keras module 
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are utilized in the software development process. The three most successful CNN models were chosen for the 
ensemble approach from among the five. 


Data Validation 


| Pre-processing | 


Resize 


Data 
Testing 


Stage 


‘Pre-processing 
stage 


stage 


Training 
Stage 


2.5. Evaluation 


Figure 3. Model deep learning 


The performance of the classification model is evaluated based on precision, recall, accuracy, and 
F1 Score. These metrics are calculated based on true positive (TP), true negative (TN), false positive (FP), 
and false negative (FN) data from the confusion matrix based on (1)-(4). TP is the number of true positive 
predictions, TN is the number of true negative predictions, FP is the number of false positive predictions, and 
FN is the number of false negative predictions [26]-[28]. 


TP 


Precision = —— (1) 
TP+FP 
TP 
Recall = (2) 
TP+FN 
TP+TN 
Accuracy = —————— (3) 
TP+FN+TN+FP 
recallxprecision 
Fisovce= 2 (4) 


recall+precision 


3. RESULTS AND DISCUSSION 

The evaluation metrics results for the trained models of InceptionV3, DenseNet, VGG16, ResNet50, 
and ResNet152 in this study are shown in Table 3. It can be seen that the three best models are InceptionV3, 
DenseNet201, and VGG16. Example of train-validation loss and accuracy graphs of the DenseNet201 are 
shown in Figure 4 and Figure 5. In the figure, it can be seen that the loss and accuracy of the DenseNet201 
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model in the training process are approaching convergence above 25 epochs. The other models also started to 
converge around 25 epochs. 


Table 3. Evaluation metrics for the trained models of the sea turtle classifier 


No Models Accuracy Precison Recall _ Fl-Score 
1 InceptionV3 0.60 0.58 0.60 0.56 
2 DenseNet201 0.70 0.76 0.70 0.70 
3. VGGI16 0.69 0.71 0.69 0.68 
4 Resnet50 0.29 0.14 0.29 0.19 
5___ Resnet152 0.40 0.44 0.40 0.39 
model loss 
eo train 
— valid 
204 
15 
g 
10 4 
5 
0 
0 50 100 150 200 250 300 
epoch 
Figure 4. Inception V3 train and validation loss 
model accuracy 
104 
09 + 
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g 
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Figure 5. InceptionV3 train and validation accuracy 


The ensemble model is composed of the three best models: Ensemble InceptionV3 - VGG16, 
Ensemble InceptionV3 - DenseNet201, VGG16 - DenseNet201, and Ensemble InceptionV3 - DenseNet201 - 
VGGI16. The evaluation metrics for the ensemble models are shown in Table 4. It shows that the best 
ensemble model is VGG16 - DenseNet201 with accuracy, precision, recall, and Fl-Score of 0.74, 0.75, 0.74, 
and 0.76, respectively. The ensemble model shows a significant performance improvement over the original 
models. 
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Table 4. Evaluation metrics for the ensemble trained models of the sea turtle classifier 


No Ensemble Models Accuracy Precision Recall _ Fl-Score 
1 InceptionV3 - VGG16 0.70 0.71 0.70 0.69 
2 InceptionV3 - DenseNet201 0.72 0.75 0.72 0.72 
3. WVGG16 - DenseNet201 0.74 0.75 0.74 0.76 
4 __InceptionV3 - DenseNet201 - VGG16 0.70 0.72 0.70 0.70 


Table 5 shows the comparison of the performance of the VGG16 - DenseNet201 ensemble model 
with the original model (VGG16 and DenseNet201) for each class; in this case, only precision, recall and F1 - 
Score were observed. It can be seen that the model ensemble shows better performance. Evaluation metrics 
increased for all classes except for the green turtles class, the ensemble model experienced a decrease in 
precision compared to the VGG16 model and a decrease in recall compared to the DenseNet201 model. It 
can be said that the ensemble model as a whole has a better classification performance for all performance 
classes compared to the original model. 


Table 5. Comparison transfer learning VGG16, DenseNet201 (D201), and ensemble model (EM) 


Sea Turtles Precision Recall F1-Score 
D201 VGGI6 EM D201 VGG EM D201 VGG EM 
Green Turtles 0.54 0.71 0.65 0.83 0.37 0.76 0.65 0.48 0.70 


Olive Ridley Turtles 0.81 0.79 0.84 0.84 086 086 084 083 0.85 
Loggerhead Turtles 0.50 0.71 0.81 0.47 0.67 O57 0.62 0.69 0.67 
Hawksbill Turtles 0.52 0.58 0.71 056 0.88 0.74 0.60 0.70 0.72 


4. CONCLUSION 

In this study, a deep learning ensemble CNN-based marine turtle classification model was 
developed. The ensemble model was selected from the InceptionV3, DenseNet, VGG16, ResNet50, and 
ResNet152 architectures. Based on individual model evaluation, it was found that the three best models are 
InceptionV3, DenseNet201, and VGG16. The ensemble model is composed of the three best models from 
individual evaluation: InceptionV3 - VGG16, InceptionV3 - DenseNet201, VGG16 - DenseNet201, and 
InceptionV3 - DenseNet201 - VGG16. VGG16 - DenseNet201 is the best ensemble model obtained, with 
accuracy, precision, recall, and Fl-Score values of 0.74, 0.75, 0.74, and 0.76, respectively. The result shows 
that the ensemble model outperforms the original models. Based on the performance of the classifier for each 
class prediction, all evaluation metrics show an improvement except for the green turtles class. The VGG16 
precision for the green turtles class decreased to 0.65 from 0.71 and the DenseNet201 recall decreased to 0.76 
from 0.83, but there was still some improvement at F1-Score from 0.65 to 0.70. Overall, this study shows that 
the ensemble method can improve classification performance better than the individual model. 
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